Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix monoexonic transcripts filtering #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Fix monoexonic transcripts filtering #5

wants to merge 1 commit into from

Conversation

chbk
Copy link
Contributor

@chbk chbk commented Sep 22, 2022

When filtering transcripts, monoexonic transcripts that have no overlaps in other annotations are deleted.

To detect overlaps, the algorithm sorts all monoexonic transcripts by chromosome, strand, position, and then compares each transcript's start position to the previous transcript's end position. If transcript n overlaps transcript n-1, it is added to the group, otherwise a new group is created.

An issue with this approach arises when transcript n and n-1 do not overlap yet both overlap transcript n-2.

This PR fixes this issue. The algorithm is modified to compare the transcript's start position to the previous maximum end position. The previous maximum end position is reset when the chromosome changes or the strand changes.

Example:

chromosome strand start end previous maximum end group (before PR) group (after PR)
1 + 5 9 -1 1 1
1 + 10 18 9 2 2
1 + 11 13 18 2 2
1 + 14 20 18 3 2
1 + 15 17 20 3 2
1 + 21 25 20 4 3
1 - 19 24 -1 5 4
1 - 22 26 24 5 4
1 - 25 27 26 5 4
2 + 5 9 -1 6 5
2 + 10 18 9 7 6
2 + 12 14 18 7 6
2 + 15 19 18 8 6
2 + 20 23 19 9 7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant